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Abstract 

In this paper, a branching process approximation for the spread of a 
Reed-Frost epidemic on a network with tunable clustering is derived. 
The approximation gives rise to expressions for the epidemic threshold 
and the probability of a large outbreak in the epidemic. It is investigated 
how these quantities varies with the clustering in the graph and it turns 
out for instance that, as the clustering increases, the epidemic threshold 
decreases. The network is modelled by a random intersection graph, in 
which individuals are independently members of a number of groups and 
two individuals are linked to each other if and only if they share at least 
one group. 

Keywords: Epidemics, random graphs, clustering, branching process, epi- 
demic threshold. 
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1 Introduction 

This paper is concerned with Reed-Frost epidemics modified to take place 
on random networks. Introduced in 1928 by two medical researchers, Lowell 
Reed and Wade Frost, the Reed-Frost model is one of the simplest stochas- 
tic epidemic models. The spread of the infection takes place in generations: 
Each individual that is infective at time t (t = 0,1,...) independently makes 
contacts with all other individuals in the population with some probability p, 
and if a contacted individual is susceptible, it becomes infected at time t + 1. 
Also at time t + 1, the infective individuals from time t are removed from the 
epidemic process. 

The behavior of the Reed-Frost model is well understood, see e.g. von Bahr 
and Martin-L6f (1980). A crucial assumption which simplifies the analysis of 
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the model is that the population in which the epidemic takes place is taken 
to be homogeneously mixing, that is, an infective individual is assumed to 
make contacts with all other individuals in the population with the same 
probability. This assumption is of course very unrealistic, since, in a real-life 
epidemic, an infective individual is much more likely to infect individuals with 
whom he/she has some kind of social connection. The Reed- Frost model can 
easily be adapted to incorporate this type of heterogeneity by introducing a 
graph to represent the social structure in the population and then stipulating 
that infective individuals can only infect their neighbors in the social network; 
see Section 3. This modification makes the analysis of the model two- fold. 
Firstly, one wants to find a realistic model for the underlying social network, 
and, secondly, one wants to study the behavior of the epidemic on this graph. 

Large complex networks such as social contact structures, the internet and 
various types of collaboration networks have received a lot of attention during 
the last few years; see e.g. Dorogovtsev and Mendes (2003) and Newman et 
al. (2006) and the references therein. As for social networks, one of their most 
striking features is that they are highly clustered, meaning roughly that there 
is a large number of triangles and other short cycles; see e.g. Newman (2003). 
This is a consequence of the fact that friendship circles are typically strongly 
overlapping so that many of our friends are also friends of each other. A 
model that captures this in a natural way is the so called random intersection 
graph, which is described in Section 2. Roughly, the idea of the model is that 
people are members of groups — families, schools, workplaces etc. — and an 
edge is drawn between two individuals if they share at least one group. If 
the relation between the number of individuals and the number of groups is 
chosen appropriately, this leads to a graph where the amount of clustering can 
be tuned by adjusting the parameters of the model. 

An important goal of network modelling is to investigate how the structure 
of the network affects the behavior of various types of dynamic processes on 
the network; see Durrett (2006) for an overview. When it comes to epidemics, 
Andersson (1999) is a comprehensible introduction, in which expressions for 
the epidemic threshold, the probability of a large outbreak and the final size 
of the epidemic are derived in a heuristic way for a number of underlying 
graphs. Here, the epidemic threshold, commonly denoted by Ro, is defined as 
a function of the parameters of the model such that a large outbreak in the 
epidemic has positive probability if and only if Ro > 1 . In epidemic modelling, 
a common technique for deriving expressions for the epidemic threshold and 
the probability of a large outbreak is to use branching process approximations 
of the early stages of the epidemic. However, when studying epidemics on 
networks, dependencies between the edges in the graph tend to make branching 
process approximations more complicated. Results for epidemics on graphs 
with arbitrary degree distribution can be found in Andersson (1998), and 
Erdos-Renyi graphs and some extensions thereof are dealt with in Neal (2004, 
2006). There is however very little work done on more complicated graph 
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structures. 

The aim of this paper is to give a rigorous analysis of how clustering in a 
network affects the spread of an epidemic. The network is modelled by a ran- 
dom intersection graph with tunable clustering and we then let a Reed-Frost 
epidemic propagate on this graph. Comparing the epidemic with a certain 
branching process yields (implicit) expressions for the epidemic threshold and 
the probability of a large outbreak. Numerical evaluations reveal that, as the 
clustering increases, the epidemic threshold decreases — that is, large out- 
breaks are possible for larger parts of the parameter space — but also that the 
actual value of the probability of a large outbreak decreases as the clustering 
approaches its maximal value. To our knowledge, this is the first rigorous 
investigation of how the spread of an epidemic is affected by clustering. 

In Newman (2003:2), the effect of clustering on epidemics is studied by 
heuristic means, and calculations therein indicate that indeed the epidemic 
threshold should decrease as the clustering increase. Furthermore, Trapman 
(2007) studies epidemics on graphs with a given expected number of triangles, 
but the construction of the graph is more involved there. We mention also the 
work by Ball et al. (1997) on the so called household model, which describes the 
spread of an epidemic in a population with group structure. The model there 
however is not formulated in terms of an underlying graph and the concept of 
clustering is not considered. 

The paper is organized as follows. In Section 2, random intersection graphs 
and their properties are described in more detail. Section 3 contains the main 
result — a comparison of a Reed-Frost epidemic on a random intersection 
graph with a branching process — and its proof. In Section 4, the final size 
of the epidemic is commented on. It is observed that a thinned random in- 
tersection graph is in fact not a random intersection graph, implying that 
results concerning the component structure in a random intersection graph 
cannot be used to draw conclusions about the final size of the epidemic. In 
Section 5, numerical results are presented and, finally, Section 6 contains a 
short discussion. 

2 Random intersection graphs 

Random intersection graphs were introduced in Singer (1995) and Karohski 
et al. (1999). In its simplest form, the model is defined as follows: Given a 
set V of n vertices and a set A of m auxiliary vertices, construct a bipartite 
graph Bn^m^ by letting each edge between vertices v € V and a G A exist 
independently with probability r. The random intersection graph Q n>m , r with 
vertex set V is obtained by connecting two vertices v,w G V if and only if 
there is a vertex a £ A such that a is linked to both v and w in B n , m ,r- This 
construction can be generalized in various ways — see e.g. Godehardt and 
Jaworski (2002) and Deijfen and Kets (2007) — but in this paper we will stick 
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to the above formulation. We will also specialize to the case when m = |_/3n a J 
for some constants a, [3 > 0; see Karohski et al. (1999) for a motivation of this 
choice of m. In fact, to get a graph with tunable clustering, we will soon take 
a = 1. 

If the vertices in V and A are thought of as individuals and groups re- 
spectively, then the random intersection graph provides a model for a social 
network where individuals are connected if there is at least one group where 
they are both members. The probability that two individuals do not share 
any group is (1 — r 2 ) m , implying that the edge probability in the random in- 
tersection graph is 1 — (1 — r 2 ) m , and hence the expected degree of a fixed 
vertex is 

(n - 1)(1 - (1 - r 2 ) m ) = (3r 2 n l+a + 0(rV +2a ). 

To keep this expression bounded as n — > oo, we let r = 7n~( 1+Q )/ 2 for some 
7 > 0. The expected degree then tends to /?7 2 as n — > oo. 

As for the asymptotic distribution of the vertex degree with the above 
choices of m and r, it is shown in Stark (2004) to be a point mass at 
for a < 1, a compound Poisson distribution, describing the law of a sum 
of a Poisson(/?7) distributed number of independent Poisson(7) variables for 
a = 1, and a Poisson(/37 2 ) distribution for a > 1. To see this, note that the 
number of groups that an individual belongs to is binomially distributed with 
mean mr = (5^n^ a ^ 1 ^ 2 . For a < 1, this goes to as n — > oo, explaining the 
point mass at 0. For a = 1, the number of group memberships per individual 
is asymptotically Poisson(/?7) distributed, and the sizes of the groups are 
Poisson(7), with overlaps between groups being very unlikely if n is large, 
indicating that the degree distribution should indeed be compound Poisson. 
When a > 1, each individual belongs to infinitely many groups as n — > oo. 
This means that the edge indicators are asymptotically independent, which 
suggests a Poisson distribution for the vertex degree. In fact, for a > 1, 
the random intersection graph is similar to the standard Erdos-Renyi random 
graph; see Fill et al. (2000). 

Moving on to the clustering in the graph, for two vertices v, w G V, let I vw 
denote the edge indicator for the edge between v and w in Q n ,m,r, and write 
P n for the probability measure of Q n ,m,r- We then define the clustering as 

c = c a ,/3,7 • = lhn P n (I vw = V\I VU I WU = 1), 

n^oo 

that is, c is the limiting conditional probability that there is an edge between 
two vertices v and w , given that they have a common neighbor u. The expected 
number of groups that individual u belongs to is /^n^" -1 )/ 2 , which goes to 0, 
/?7 or infinity, depending on whether a < 1, a = 1 ora > 1. As a consequence, 
the limiting probability that two individuals v and w who both share a group 
with u in fact share the same group with u — thus being connected to each 
other — will behave differently depending on the value of a. More specifically, 
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it is shown in Deijfen and Kets (2007) that 



= < 



1 if a < 1; 

+ ifa = l; 

if a > 1. 



In view of the result from Stark (2004) concerning the degree distribution 
and the characterization of the clustering from Deijfen and Kets (2007), the 
best choice if we want to use a random intersection graph to describe a social 
network seems to be a = 1. This gives rise to a model where both the mean 
degree and the clustering can be tuned by adjusting the parameters (3 and 7. 
More precisely, with D denoting the limiting degree of a fixed vertex, we have 
that 

E[D] = /?7 2 and c = (1 + py)' 1 . 

For the remainder of this paper we fix a = 1 and write Q^l = and 

= B^ for the corresponding random intersection graph and its underly- 
ing bipartite graph (omitting the subscripts when the dependence on j3 and 7 
does not need to be emphasized). 



3 The epidemic model and an approximating branch- 
ing process 

Consider a closed homogeneous population consisting of n individuals, la- 
belled vi,...,v n , with a social structure represented by a random intersection 
graph Q( n l We will use the Reed-Frost dynamics to describe the spread of 
an infection in this population. The social graph is assumed to be fixed 
throughout the spread of the infection. Furthermore, for simplicity, we start 
with one single randomly selected infective individual at time 0, the rest of the 
population being susceptible. Without loss of generality, we assume that the 
initial infective, which will be referred to as the index case, is individual v\. 
An individual that is infective at time t (t = 0, 1, . . .) contacts each one of its 
neighbors in independently with some probability p, and if a contacted 
neighbor is susceptible, it becomes infective at time t + 1. The individuals 
that were infective at time t are removed from the epidemic process at time 
t + 1 (by immunity or death) and take no further part in the spread of the 
infection. 

We will be concerned with the set 8^ of individuals that are ultimately 
affected by the above epidemic. More precisely, we will construct a branching 
process that can be used to determine whether £ ( n ) is finite or infinite in the 
limit as n — > 00. To this end, first note that £^ can be identified with the 
cluster containing the index case in an edge percolation process on in 
which each edge is open independently with probability p. Open edges in 
the percolation process are interpreted as possible transmission links for the 
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disease, that is, if one of the vertices of an edge is infective at time t and 
the other one is not, then the uninfected vertex becomes infective at time 
t + 1. Furthermore, if we consider the percolation cluster of a particular 
vertex restricted to a subgraph of Q^ n \ then the size of this cluster has the 
same distribution as the final size of a Reed- Frost epidemic on the subgraph. In 
particular, if is the size of the percolation cluster of a given vertex belonging 
to a complete subgraph with k nodes, then the distribution of Rk, denoted 
by -Ffc, is that of a Reed-Frost epidemic initiated by one single individual in 
a homogeneously mixing population of size k. The distribution can be 
computed recursively; see Andersson and Britton (2000: Section 1.2). 

We now define the branching process that will be used to approximate the 
epidemic process. To begin with, note that the groups in a random inter- 
section graph generate complete subgraphs with sizes that are asymptotically 
Poisson(7) distributed. Hence, in the limit asn-> oo, the size of an outbreak 
started by a given individual in a given group is 

OO 

R- F :=Y,F k ^e-\ (1) 
k=0 

Recall that the number of groups that a given individual is a member of is 
asymptotically Poisson(/37) distributed. Let / be the generating function of a 
sum of a Poisson(/37) number of i.i.d. variables i?i,i?2, • • • , all distributed as 
R, that is, 

f(s) = exp{(3 1 (E[s R ]-l)}, (2) 

and let {Z{t) : t > 0} be a discrete time branching process with offspring 
generating function /, that is, Efs^ 1 )] = f(s). Finally, write E for the total 
progeny in such a process, that is, 

oo 

E = Y J Z{t)- 

Let E^ = \£^\ denote the final size of a Reed- Frost epidemic on a random 
intersection graph Q^ n ' . Our main result is the following theorem, which will 
be proved by relating the initial phases of the epidemic to a branching process 
with the same distribution as {Z(t) : t > 0} as n — > oo. 

Theorem 3.1 As n — > oo, we have that E^ n ' — > E in distribution. 

Define p to be the smallest non- negative root of the equation /(p) = p. 
It follows from standard results in branching processes theory that P{E = 
oo) = 1 — p and that p < 1 if and only if E[Z(1)] > 1; see e.g. Athreya and 
Ney (1972). Combining this with Theorem 13.11 gives the following corollary 
concerning the asymptotic behavior of the epidemic. 
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Corollary 3.2 Define R := E[Z(1)] = /3jE[R] and write tt = 1 - p. As 
n — > oo, we have that 

(a) — ► oo u>ii/i probability tt; 

(b) tt > if and only if Rq > 1 . 

Before we continue with the proof of the theorem, we will state and prove 
a lemma concerning the bipartite graph B^ n \ To this end, for an arbitrary 
graph Q with vertex set W, the subgraph of Q induced by some subset W' C W 
is defined to be the subgraph consisting the vertices in W' together with all 
edges in Q that run between vertices in W'. Let C^ n \t) be the vertices of B^ 
at distance t from vertex v\. Note that a vertex in S^ n ^ may be either an 
individual (that is, a vertex v G V) or a group (that is, an auxiliary vertex 
a £ A), and that vertices at odd distance from v\ correspond to groups and 
vertices at even distance to individuals. 

Lemma 3.3 Let k > be such that 1/k > 21og(/?7 2 ). As n — > oo, i/ie 
probability that the subgraph of B^ induced by C^ n > (\_K\ogn\) , is a tree, tends 
to 1. 

Proof of Lemma [373] We will build up C (n) (t) by a sequence : t > 

0}, constructed in such a way that C< n '(i) = U < s < t P (n) (s). For odd t, the 
set T>( n \t) will consist of groups and for even t by individuals. To begin 
with, by definition, we have CW(0) = {«i}, so necessarily p( n )(0) = C( n )(0). 
For odd t, the set T>( n \t) is then constructed by choosing, independently 
for each individual in T>( n \t — 1), a Binomial(m, j/n) distributed number of 
distinct groups in A, and, likewise, for even t, we construct T>( n \t) by choosing, 
independently for each group in T>( n \t — 1), a Binomial(n, 7/ra) distributed 
number of distinct individuals in V. Let be a compound binomial random 
variable with generating function 

= ( 1 _i + i( 1 _i + i s y) m , 

\ n n \ n n J J 

and let {X {n \t) : t > 0} be a branching process with offspring distribution 
jW. Furthermore, write y(")(i) = ^=o j(ti) (*) for the total progeny of the 
branching process at time t. Then, for even t, the number of individuals (not 
necessarily distinct) that have been chosen in the construction of the process 
CW(t), has the same distribution as Y ( - n \t/2), and the number of groups (not 
necessarily distinct) that have been chosen is (stochastically) strictly smaller 
than r( n )(i/2). 

Define p n := E[X^} = -/ 2 m/n and note that /3 7 2 (1 - ±) < /x n < /5 7 2 , 
so fM n — > // := /5 7 2 as n — > 00. By a well known result in branching process 
theory, we have that p~ t X^(t) — > iy( n ) almost surely as t — ► 00, where VIA") 
is a random variable with VF^ n ^ = if and only if \i n < 1; see Athreya and Ney 
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(1972). Furthermore, — > W in distribution as n —* oo, where VF is the 
corresponding limiting random variable for the branching process {X(t) : t > 
0} with offspring generating function Efs^W] = exp{P r y(e y ^ s ~ 1 ^ — 1)}. Thus 

xV>(t) = faWW + ^(i)) = ^jw + 0n (i) + 0t (l)) < M *(O n (l) + O t (l)), 

where o x (-) and O x .(-) is the usual order notation when x — > oo. It follows 
that y( n )(t) < //(O n (l) + O t (t)), and, when we set t = L«lognJ with 1/k > 
max{2 log/x, 0}, we get 

yWfl/elognJ) < e log ^ KlognJ O n (logn) = o„(V^)- 

Now note that, if all individuals and groups that have been chosen in the 
construction of C^ n '(t) are distinct, then clearly the subgraph of induced 
by C(">(t) is a tree. Thus the probability in the statement of the lemma is 
greater than 




= exp{o(l)} -> 1, 

and the lemma is proved. □ 

Proof of Theorem 13.11 The idea of the proof is to construct a branching 
process {Z^(t) : t > 0} that counts the number of individuals infected by 
the epidemic in its initial stage, though not necessarily in chronological order. 
The branching process will be defined in such a way that, if it goes extinct, 
then its total progeny will be equal to the final size of the epidemic, while, if 
it explodes, the epidemic will have infected a number of individuals which is 
increasing polynomially in n. As n — > oo, we will have — * Z — where Z 
is the branching process in the formulation of the theorem — and the theorem 
thus follows. 

First, we describe the initial spread of the disease among the individu- 
als/groups in the set C^([k log n\ ) with a process {£^ n \t) : < t < [ftlognj}. 
We only consider the case when the subgraph of induced by ( [k log nj ) 
is a tree. By Lemma 13.31 the probability of the complimentary set tends to 
zero as n — > oo, so we can disregard it. Furthermore, our construction will 
be such that £^- n \t) C C^ n \t) for all t, implying that nodes of log nj) 

constitute a tree itself if seen as a subgraph of B^ n \ 
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The construction of £^(t) is similar to the construction of C^ n \t) de- 
scribed in Lemma 13.31 Namely, we will define a sequence 
L«:lognJ} and then set £^ n \t) = U < s <^ (n) (s)- To this end, first let JF( n )(0) 
consist of the initial infective, that is J-^ n \0) = {vi}. Then, for odd t, let 
J-( n \t) consist of the groups of the individuals in ^"( n )(i — 1) that are at 
distance t from v±, that is, 

J&>(t) = {a£ V^Xt) : 3v £ F {n) (t - 1) such that v € a}. 

To define T^it) for even t, recall the percolation representation of the set 
of ultimately infected individuals in Q^ n ' described before the theorem. For 
two vertices v, w belonging to a group a, write v A w for the event that there 
exists a path of open edges — that is, edges that can be used for disease 
transmission — connecting v and w, with the additional property that the 
whole path is contained in group a. Furthermore, let K, a ,v = {w £ a : v «-> w}. 
This is to be thought of as the local outbreak in group a caused by individual 
v, if v itself becomes infected from outside of group a. As pointed out before 
the formulation of the theorem, given that \a\ = k, we have that \lC av \ ~ F^, 
where is the distribution of the final size of of a homogeneous Reed- Frost 
epidemic initiated by a single individual in a population of size k. Note that 
\a\ ~ Binomial(n, j/n), and, for future use, let be a random variable 
with distribution FkP(\a\ = k), that is, the size of a local outbreak in a 
group, not conditioning on the group size. Now, for even t, define J^ n \t) to 
be the individuals infected in the local outbreaks caused by the individuals in 
^ n \t-2), that is, 

J* n )(t) = { we )C a , v : a E T {n) (t -l),v€ J* n \t - 2)}. 

We will now study the growth of \£^> (t) \ . To this end, for < t < ^[k log nj , 
define Z^{t) = \F^{2t)\. Then, since the subgraph of induced by 
is a tree for t < \ [k log nj , by construction, (t) is a branching 
process with a compound binomial offspring distribution. The generating 
function of the offspring distribution is 



/„(*) = E s z(n) ^ =ll-± + ±E a*'" ) . (3) 
L J V n n L J / 

For t > | [k log n\ we let (t) evolve by the same branching mechanism, 
that is, as a discrete time branching process with offspring distribution defined 
by ©. For t > -[ttlognj however, Z^{t) is no longer related to the epidemic 
process. 

Let p n be the smallest non-negative root of p n = f n (Pn), and define A n = 
{lim^oo Z( n )(£) = 0} (the extinction set) and T n = infjt : Z^ n \t) = 0} (the 
extinction time). Then ~P(A n ) = p n , and, since T n is finite on A n , the total 
progeny of the branching process, 

Ef=o Z {n) (t), is finite on A n . As n — ► oo, we 
have that Z^ — > Z in distribution and p n — » p, where Z(t) is the branching 
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process appearing in the theorem and p the equivalent of p n in this process. 
Furthermore, — > T in distribution, where T = inf{i : Z(t) = 0}. It 
follows that E = Yl'tLo Z(t) is finite on the event A = {lim^oo Z(t) = 0}. 
This implies that, on A n , the final size E^ of the epidemic converges to 
E as n — > oo. On the complementary sets A c nl the process Z^ n \t) grows 
exponentially in t. More precisely, 

Z^([Klogn\) = n Kl °z Ro (W' + o(l)), 

where W > due to the conditioning on explosion (recall the proof of Lemma 
13.31) , Furthermore, we have that Z^> '([KlognJ) < E^ on A c n , and thus 
— > oo on A^. This proves the theorem. □ 

4 The final outcome of the epidemic 

The branching process approach from the previous section gives basically no 
information on the behavior of the epidemic in the case of explosion. In this 
section we will elaborate a bit on this problem. 

As already described, one way of getting a grip of the final outcome of the 
epidemic, is to consider an edge percolation process on the underlying graph, 
where each edge in the graph is independently removed with probability 1—p 
and kept with probability p. The vertices that belong to the component of the 
initial infective in the graph so obtained correspond to the individuals that 
have experienced the infection at the end of the epidemic. If the structure 
of the thinned graph is known, then this observation might be useful in in- 
vestigating the final size of the epidemic. For instance, if there is a unique 
giant component in the thinned graph — that is, if the outcome of the per- 
colation process contains a unique cluster of order n — then the relative size 
of this component gives the probability of an outbreak of order n in the epi- 
demic. Such an outbreak is often referred to as a major outbreak, and, in 
most epidemic models, the probability of such an outbreak coincides with the 
probability of explosion in the branching process describing the initial stages 
of the epidemic (denoted by ir in this paper). This however require additional 
arguments. 

In our case, the social network is a random intersection graph with a = 1. 
Unfortunately, to date there are no rigorous results concerning the component 
structure in a random intersection graph with a = 1, but see Behrisch (2007) 
for results when a / 1. Also, in Newman (2003:2), (implicit) expressions 
for the size of the largest component in a random graph construction which 
is similar to the random intersection graph are derived by heuristic means 
and it is observed that the relative final size of the giant component seems to 
decrease as the clustering in the graph increases. An argument in support of 
the claim that high clustering in a graph causes the components to be small is 
the following: Consider an arbitrary graph with n vertices and k = 0(n) edges 
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and assume that the clustering equals 1. This implies that all subgraphs are 
complete. Hence, with n max denoting the size of the largest subgraph, we have 
that the number of edges in the maximal subgraph is ( n ™ ;E ). It follows that 
nmax < 0(y/k) = 0(y/n), that is, the relative size of the largest component 
tends to zero, 

Indeed, the lack of rigorous results concerning the components in a random 
intersection graph with a = 1 makes it harder to study the final size of an 
epidemic on such a graph. A second complicating circumstance is that thinning 
a random intersection graph gives rise to a graph that no longer belongs to the 
class of random intersection graphs; see the below proposition. This means 
that, even if there would be results for the component structure, these would 
not be applicable to a thinned graph. Hence it remains an open problem to 
quantify the final outcome of the epidemic. 

Proposition 4.1 Let @ p (Gp n } / ) denote the graph generated by removing edges 
in Gp 1 ^ independently with probability 1 — p. It does not exist (5 1 = (3'(f3,j,p) 
and i = V(/?,7,p) such that 6 P (£^) = G^y for every n. 

Proof. The idea of the proof is to observe that certain types of subgraphs 
will appear with different frequency in Qp(£g!^) as compared to Gp 1 ^- The 
subgraph that we will consider consists of four vertices and five edges: 



Write K' 4 for this graph type, and note that it can be obtained for instance by 
removing one edge from a complete subgraph with four vertices, a graph type 
that we denote by K4. Furthermore, we introduce the term vertex-induced 
subgraph, for a subgraph of some given graph such that the subgraph consists 
of a subset of the vertices in the original graph together with all edges between 
these vertices that are present in the original graph. 

The number of vertex-induced subgraphs of type K4 in the random 

(n) 

intersection graph G\L dominates the number of groups of size four in the 
construction of the random intersection graph. Since the size of a fixed 
group is Binomial(n, 7/n) distributed, the number of groups of size four is 
Binomial(L/3nJ, (™) (7/n) 4 (l - 7/n)™" 4 ) distributed, and hence E[X 4 ] > 0(n). 
It follows that the number X' A (p) of vertex-induced subgraphs of type K' A in 
the thinned graph Q p {Qp n ] f ) is also at least of the order n, since, as mentioned, 
one way of obtaining graphs of type K' 4 is to remove one edge in graphs of 
type K 4 , that is, E[X' 4 (p)] > (1 - p)p 5 E[X 4 ] > 0(n). 

Now consider the number X^ of vertex-induced subgraphs of type K 4 in 
the random intersection graph Gq 1 !- This number is related to the number 
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of ways that four individuals v%, . . . , U4 can be assigned to different groups so 
that a graph of type K'^ is obtained. Consider for instance the following graph 
of type K' A : 

VI 9 # t>2 




V3 • mV4 

Write {(% , Vi 2 )(vj 1 , Vj 2 )(vfci , Vk 2 )} for the event that and V{ 2 share a group, 
that Vj x and fj 2 share another group and that v^ x and Vk 2 share yet another 
group. Then, for the above graph to arise, the individuals vi and ^3 cannot 
share any group — the probability that they avoid doing so goes to 1 as n — > 00 
- and, in addition, one of the following events must occur: 

{(vi,V2,Vi)(vi,V 3 ,V4)} 
{(vi,V 2 ,V^(v 1 ,V 3 )(v 3 ,V i )} 

{ (vi , v 3 , v 4 ) (ui , v 2 ) (v 2 , U4) } 

{(Vl, « 2 )(«1, «4)(«2, V4)(«l, «3)(«3, ^4)}- 

It follows that 

EM<.f«)w»)'«"f , ; l )bM' 
+ « 4 ( L " 5 " J )(t/«) 10 = o(i). 

The number of vertex-induced subgraphs of type K' 4 in a random intersec- 
tion graph is hence finite, while, in a thinned random intersection graph, it is 
of order n. This proves the proposition. □ 

5 Numerical results 

In this section, we investigate numerically the epidemic threshold R$ and the 
probability ir of explosion in the epidemic — recall Corollary 13.21 We have 
-^o = /^7E[i?], where the distribution of R is specified in ([T]), and tt := 1 — p, 
where p is the smallest non- negative root of the equation f(p) = p and / 
is the generating function specified in ([2]). Using the recursive formulas for 
the distribution Fk of the final size a Reed-Frost epidemic in a homogeneous 
population of size k — see e.g. Andersson and Britton (2000: Section 1.2) - 
numerical values of Ro and ir are easily obtained for fixed values of j3 and 7. 

We are particularly interested in how p and Ro are affected when the 
(asymptotic) clustering c = (1 + is varied, and, to be able to compare 

results for different values of c, the mean degree p, = Pj 2 in the graph is kept 
fixed. In Figure 1, the parameters Rq and tt are plotted against c for three 
different values of the infection probability p. Let us comment a bit on these 
plots. 
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Figure 1: In the top figure, Rq is plotted against c for fixed [i = 4 and for 

three choices of p: p = 0.2 ( ), 0.3 (- -), and 0.5 (- • -). The bottom 

figure shows how the probability tt of explosion in the epidemic varies with c, 
for fixed fj, = 4, and the same choices of p as above. 
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First we investigate the value of Rq in the limit as c — > 0. Since fi is fixed, 
we have that c = (1 + fi/^)^ 1 — > implies that 7 — > as well. Asymptotically, 
the degree distribution in our random intersection graph is compound Poisson 
with generating function 

g {s) = e^C^- 15 "!) 

which converges to e^ s ~^ as 7 — > 0. The limiting degree distribution as 
c — > is hence Poisson(^), and, after thinning the graph, removing each edge 
independently with probability 1 — p, the degrees are Poisson distributed with 
mean p[i. Since the graph obtained by such a thinning can be thought of 
as representing the outcome of the epidemic, it is reasonable to suspect that 
Rq = pfj, in the limit as c —* 0. Indeed, it can be seen in the top plot in Figure 
[1] that Rq - *• PH as c — ► 0. 

In the top plot in Figure 1, it can also be seen that Rq increases with c, 
that is, higher clustering makes it easier for epidemics to take off. This is in 
line with findings in Newman (2003:2). Let us give a heuristic explanation of 
why this should be the case: First note that, since the mean degree \x in the 
graph is fixed, an increase in c = (1 + is equivalent to an increase in 

7 and a decrease in (5 of the order 7 -2 . Also, recall that the mean number of 
groups that an individual is a member of is ^7 and the mean group size is 7. 
Hence, increased clustering with fixed mean degree means that individuals are 
members of fewer but larger groups. Combining this with the observation that 
the probability for an individual to avoid infection from some index case with 
whom he/she shares a group decreases geometrically with the group size, it 
follows that it should be easier for an epidemic to take off when the clustering 
is large. In fact, we have that Rq — > fi as c —* 1, that is, in the limit of large 
clustering, the infection probability p does not matter (as long as it is positive) 
for the value of Rq. 

The bottom plot in Figure 1 shows how the probability tt of explosion in 
the epidemic varies with c. For instance, it can be seen that ir — * as c — * 1. 
In Section 4, we argued that the relative size of the largest component in a 
graph with maximal clustering is in the limit of large graph size. If the 
probability of explosion in the epidemic coincides with the relative size of the 
largest component in the graph representing the outcome of the epidemic, then 
indeed it follows from this that tt — > as c — > 1. Furthermore, it is interesting 
to note that the decrease of tt towards is not monotone for all values of p. 
Clearly, if a low value of c prevents explosion, while explosion is possible for a 
larger value of c — this is the case for instance for p = 0.2 — then we will see 
an increase in tt from to a positive value when the threshold is passed. But, 
as the curve for p = 0.3 reveals, even if tt is positive already at c = 0, it can be 
the case that it increases with c in some interval before it starts to decrease. 
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6 Discussion 



In the present paper, we have analyzed how the clustering in a random net- 
work affects how an infectious disease propagates in the network, assuming 
the size of the network to be large. In particular, using a random intersec- 
tion graph construction, we have rigorously derived the limiting probability 
of an explosion in the epidemic and a threshold parameter indicating if this 
probability is or positive. 

The motivation for analyzing an epidemic on a random network with pos- 
itive clustering is of course that most empirical social networks manifest posi- 
tive clustering, so predictions based on epidemic models neglecting such clus- 
tering, i.e. most epidemic models, must be interpreted with caution. There 
are of course several other features in empirical networks, not considered in 
the present paper, that should also be taken into account for predictions to be 
reliable. One such feature is the degree distribution, which in many social net- 
works has been observed to follow a power-law distribution. The graph model 
used in this paper gives compound Poisson distributions for the degrees, but 
the model is generalized in Deijfen and Kets (2007) to allow for power-law 
degree distributions. It would be interesting to study how an epidemic on 
such a generalized graph is affected by the exponent in the power-law. An- 
other feature that has been observed in many social networks is positive degree 
correlation, that is, individuals with high (low) degree tend to be connected 
to other individuals with high (low) degree. Because of the group structure, 
this is likely to be the case in a random intersection graph, but it remains to 
quantify the correlation. 

A possible generalization of the studied model would be to distinguish 
between different types of individuals, and to assume that both network char- 
acteristics as well as transmission probabilities depend on the type of an in- 
dividual; see e.g. Ball and Clancy (1993). Another extension, motivated by 
real-world epidemics, is to leave the Reed-Frost paradigm, in which the events 
that different neighbors of a given infective becomes infected are independent. 
If for example the infectious period is taken to be random, then these events 
are positively correlated; see e.g. Andersson and Britton (2000). Unfortu- 
nately, by relaxing the independence assumption the analysis of the model 
becomes much more complicated. 

Perhaps the most obvious continuation of the present work is however to 
derive fully rigorous results about the final size of the epidemic in case of 
explosion. The (relative) final size of the epidemic then most likely coincides 
with the probability of explosion, a quantity derived in the present paper, but 
a proof of this is still missing. 
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